891 research outputs found

    Optimization of miRNA-seq data preprocessing.

    Get PDF
    The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments

    ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

    Get PDF
    BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN

    Overpressures in the Uinta Basin, Utah: analysis using a three-dimensional basin evolution model

    Get PDF
    Journal ArticleAbstract. High pore fluid pressures, approaching lithostatic, are observed in the deepest sections of the Uinta basin,Utah. Geologic observations and previous modeling studies suggest that the most likely cause of observed overpressure is hydrocarbon generation. We studied Uinta overpressure by developing and applying a three-dimensional, numerical model of the evolution of the basin. The model was developed from a public domain computer code, with addition of a new mesh generator that builds the basin through time, coupling the structural thermal, and hydrodynamic evolution. Also included in the model are in situ hydrocarbon generation and multiphase migration. The modeling study affirmed oil generation as an overpressure mechanism but also elucidated the relative roles of multiphase fluid interaction, oil density and viscosity and sedimentary compaction. An important result is that overpressures by oil generation create conditions for rock fracturing, and associated fracture permeability may regulate or control the propensity to maintain overpressures

    Exome sequencing identifies nonsegregating nonsense ATM and PALB2 variants in familial pancreatic cancer.

    Get PDF
    We sequenced 11 germline exomes from five families with familial pancreatic cancer (FPC). One proband had a germline nonsense variant in ATM with somatic loss of the variant allele. Another proband had a nonsense variant in PALB2 with somatic loss of the variant allele. Both variants were absent in a relative with FPC. These findings question the causal mechanisms of ATM and PALB2 in these families and highlight challenges in identifying the causes of familial cancer syndromes using exome sequencing

    A cancer cell-line titration series for evaluating somatic classification.

    Get PDF
    BackgroundAccurate detection of somatic single nucleotide variants and small insertions and deletions from DNA sequencing experiments of tumour-normal pairs is a challenging task. Tumour samples are often contaminated with normal cells confounding the available evidence for the somatic variants. Furthermore, tumours are heterogeneous so sub-clonal variants are observed at reduced allele frequencies. We present here a cell-line titration series dataset that can be used to evaluate somatic variant calling pipelines with the goal of reliably calling true somatic mutations at low allele frequencies.ResultsCell-line DNA was mixed with matched normal DNA at 8 different ratios to generate samples with known tumour cellularities, and exome sequenced on Illumina HiSeq to depths of >300×. The data was processed with several different variant calling pipelines and verification experiments were performed to assay >1500 somatic variant candidates using Ion Torrent PGM as an orthogonal technology. By examining the variants called at varying cellularities and depths of coverage, we show that the best performing pipelines are able to maintain a high level of precision at any cellularity. In addition, we estimate the number of true somatic variants undetected as cellularity and coverage decrease.ConclusionsOur cell-line titration series dataset, along with the associated verification results, was effective for this evaluation and will serve as a valuable dataset for future somatic calling algorithm development. The data is available for further analysis at the European Genome-phenome Archive under accession number EGAS00001001016. Data access requires registration through the International Cancer Genome Consortium's Data Access Compliance Office (ICGC DACO)

    Distinct routes of lineage development reshape the human blood hierarchy across ontogeny

    Get PDF
    In a classical view of hematopoiesis, the various blood cell lineages arise via a hierarchical scheme starting with multipotent stem cells that become increasingly restricted in their differentiation potential through oligopotent and then unipotent progenitors. We developed a cell-sorting scheme to resolve myeloid (My), erythroid (Er), and megakaryocytic (Mk) fates from single CD34+ cells and then mapped the progenitor hierarchy across human development. Fetal liver contained large numbers of distinct oligopotent progenitors with intermingled My, Er and Mk fates. However, few oligopotent progenitor intermediates were present in the adult bone marrow. Instead only two progenitor classes predominate, multipotent and unipotent, with Er-Mk lineages emerging from multipotent cells. The developmental shift to an adult ‘two-tier’ hierarchy challenges current dogma and provides a revised framework to understand normal and disease states of human hematopoiesis.This work was supported by Postdoctoral Fellowship Awards from Canadian Institute of Health Research (CIHR) to FN and SZ. SZ is supported by (Aplastic Anemia). FN is a recipient of a scholar’s research award from the Ontario Institute of Cancer Research (OICR), through generous support from the Ontario Ministry of Research and Innovation. Research in EL laboratory is supported by a Wellcome Trust Sir Henry Dale Fellowship and core support grant from the Wellcome Trust and MRC to the Wellcome Trust – Medical Research Council Cambridge Stem Cell Institute. Work in the Dick laboratory is supported by grants from the CIHR, Canadian Cancer Society, Terry Fox Foundation, Genome Canada through the Ontario Genomics Institute, OICR with funds from the province of Ontario, a Canada Research Chair and the Ontario Ministry of Health and Long Term Care (OMOHLTC).This is the author accepted manuscript. The final version is available from AAAS via http://dx.doi.org/10.1126/science.aab211

    Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants.

    Get PDF
    A locus on human chromosome 11q23 tagged by marker rs3802842 was associated with colorectal cancer (CRC) in a genome-wide association study; this finding has been replicated in case-control studies worldwide. In order to identify biologic factors at this locus that are related to the etiopathology of CRC, we used microarray-based target selection methods, coupled to next-generation sequencing, to study 103 kb at the 11q23 locus. We genotyped 369 putative variants from 1,030 patients with CRC (cases) and 1,061 individuals without CRC (controls) from the Ontario Familial Colorectal Cancer Registry. Two previously uncharacterized genes, COLCA1 and COLCA2, were found to be co-regulated genes that are transcribed from opposite strands. Expression levels of COLCA1 and COLCA2 transcripts correlate with rs3802842 genotypes. In colon tissues, COLCA1 co-localizes with crystalloid granules of eosinophils and granular organelles of mast cells, neutrophils, macrophages, dendritic cells and differentiated myeloid-derived cell lines. COLCA2 is present in the cytoplasm of normal epithelial, immune and other cell lineages, as well as tumor cells. Tissue microarray analysis demonstrates the association of rs3802842 with lymphocyte density in the lamina propria (p = 0.014) and levels of COLCA1 in the lamina propria (p = 0.00016) and COLCA2 (tumor cells, p = 0.0041 and lamina propria, p = 6 × 10(-5)). In conclusion, genetic, expression and immunohistochemical data implicate COLCA1 and COLCA2 in the pathogenesis of colon cancer. Histologic analyses indicate the involvement of immune pathways
    • …
    corecore